89 research outputs found

    An Intelligent audio workstation in the browser

    Get PDF
    Music production is a complex process requiring skill and time to undertake. The industry has undergone a digital revolution, but unlike other industries the process has not changed. However, intelligent systems, using the semantic web and signal processing, can reduce this complexity by making certain decisions for the user with minimal interaction, saving both time and investment on the engineersā€™ part. This paper will outline an intelligent Digital Audio Workstation (DAW) designed for use in the browser. It outlines the architecture of the DAW with its audio engine (built on the Web Audio API), using AngularJS for the user interface and a relational database

    Improving peak picking using multiple time-step loss functions

    Get PDF
    The majority of state-of-the-art methods for music infor-mation retrieval (MIR) tasks now utilise deep learningmethods reliant on minimisation of loss functions such ascross entropy. For tasks that include framewise binaryclassification (e.g., onset detection, music transcription)classes are derived from output activation functions byidentifying points of local maxima, or peaks. However, theoperating principles behind peak picking are different tothat of the cross entropy loss function, which minimises theabsolute difference between the output and target valuesfor a single frame. To generate activation functions moresuited to peak-picking, we propose two versions of a newloss function that incorporates information from multipletime-steps: 1)multi-individual, which uses multiple indi-vidual time-step cross entropies; and 2)multi-difference,which directly compares the difference between sequentialtime-step outputs. We evaluate the newly proposed lossfunctions alongside standard cross entropy in the popularMIR tasks of onset detection and automatic drum tran-scription. The results highlight the effectiveness of theseloss functions in the improvement of overall system ac-curacies for both MIR tasks. Additionally, directly com-paring the output from sequential time-steps in the multi-difference approach achieves the highest performance

    Intelligent audio plugin framework for the Web Audio API

    Get PDF
    The Web Audio API introduced native audio processing into web browsers. Audio plugin standards have been created for developers to create audio-rich processors and deploy them into media rich websites. It is critical these standards support flexible designs with clear host-plugin interaction to ease integration and avoid non-standard plugins. Intelligent features should be embedded into standards to help develop next-generation interfaces and designs. This paper presents a discussion on audio plugins in the web audio API, how they should behave and leverage web technologies with an overview of current standards

    Trainable data manipulation with unobserved instruments

    Get PDF
    Machine learning algorithms are the core components in a wide range of intelligent music production systems. As training data for these tasks is relatively sparse, data augmentation is often used to generate additional training data by slightly altering existing training data. User-defined techniques require a long parameter tuning process and typically use a single set of global variables. To address this, a trainable data manipulation system, termed player vs transcriber, was proposed for the task of automatic drum transcription. This paper expands the player vs transcriber model by allowing unobserved instruments to also be manipulated within the data augmentation and sample addition stages. Results from two evaluations demonstrate that this improves performance and suggests that trainable data manipulation could benefit additional intelligent music production tasks

    Navigating Descriptive Sub-Representations of Musical Timbre

    Get PDF
    Musicians, audio engineers and producers often make use of common timbral adjectives to describe musical signals and transformations. However, the subjective nature of these terms, and the variability with respect to musical context often leads to inconsistencies in their deļ¬nition. In this study, a model is proposed for controlling an equaliser by navigating clusters of datapoints, which represent grouped parameter settings with the same timbral description. The associated interface allows users to identify the nearest cluster to their current parameter setting and recommends changes based on its relationship to a cluster centroid. To do this, we apply dimensionality reduction to a dataset of equaliser curves described as warm and bright using a stacked autoencoder, then group the entries using an agglomerative clustering algorithm with a coherence-based distance criterion. To test the eļ¬ƒcacy of the system, we implement listening tests and show that subjects are able to match datapoints to their respective sub-representations with 93.75% mean accuracy
    • ā€¦
    corecore